Wafer-scale integration and two-level pipelined implementations of systolic arrays
نویسندگان
چکیده
The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying of this document without permission of its author may be prohibited by law. Abstract This paper addresses two important issues in systolic array designs. How do we provide fault-tolerance in systolic arrays for yield enhancement in wafer-scale integration implementations? And, how do we design efficient systolic arrays with two levels of pipelining? The first level refers to the pipelined organization of the array at the cellular level, and the second refers to the pipelined functional units inside the cells. The fault-tolerant scheme we propose replaces defective cells with clocked delays. This has the distinct characteristic that data can flow through the array with faulty cells at the original clock speed. We will show that both the defective cells under this fault-tolerant scheme and the second level pipeline-stages can simply be modeled as additional delays in the data paths of "generic" systolic designs. We introduce the mathematical notion of a cut to solve the problem of how to allow for these extra delays while preserving the correctness of the original systolic array designs. The results obtained by applying the techniques described in this paper are encouraging. When applied to systolic arrays without feedback cycles, the arrays can tolerate large numbers of failures (with the addition of very little hardware) while maintaining the original throughput Furthermore, all of the pipeline stages in the cells can be kept fully utilized through the addition of a small number of delay registers. However, adding delays to systolic arrays with cycles typically induces a significant decrease in throughput In response to this, we have derived a new class of systolic algorithms in which the data cycle around a ring of processing cells. The systolic ring architecture has the property that its performance degrades gracefully as cells fail. Using our cut theory and ring architectures for arrays with feedback, we have effective fault-tolerant and two-level pipelining schemes for most systolic arrays. As a side-effect of developing the ring architecture approach we have derived several new systolic algorithms. These algorithms generally require only one-third to .one-half of the number of cells used in previous designs to achieve the same throughput The new systolic algorithms include ones for LU-decomposition, QR-dccomposition and the solution of triangular linear systems.
منابع مشابه
Homogeneous VLSI structures for high speed digital signal processing using number theoretic techniques
Exact computations, performed with residues, occur in Number Theoretic Transforms and Residue Number System implementations. Once thought awkward to implement with standard logic circuits, the application of efficient small lookup tables, constructed with pipelined dynamic ROM's, allows very efficient construction of hardware ideally suited to residue operations. Linear DSP operations that are ...
متن کاملHighly Concurrent VLSI Computing Structures for DCA
In this paper highly concurrent pipelined computing structures based on a constrained digital contour smoothing are described. The smoothing minimizes the undersampling, digitizing and quantizing error and so it is able to improve the stability of invariants calculation. The word-level and bit-level systolic arrays for completely pipelined calculation of the constrained least-squares digital co...
متن کاملFault-tolerance and two-level pipelining in VLSI systolic arrays
This paper addresses two important issues in systolic array designs: fault-tolerance and two-level pipelining. The proposed "systolic" fault-tolerant scheme maintains the original data flow pattern by bypassing defective cells with a few registers. As a result, many of the desirable properties of systolic arrays (such as local and regular communication between cells) are preserved. Two-level pi...
متن کاملEfficient implementation of low time complexity and pipelined bit-parallel polynomial basis multiplier over binary finite fields
This paper presents two efficient implementations of fast and pipelined bit-parallel polynomial basis multipliers over GF (2m) by irreducible pentanomials and trinomials. The architecture of the first multiplier is based on a parallel and independent computation of powers of the polynomial variable. In the second structure only even powers of the polynomial variable are used. The par...
متن کاملHigh-Rate Viterbi Processor: A Systolic Array Solution
In exploiting the potentials of highly parallel architectures to speed up the computation rate of systems enabled by VLSI, special attention has to be paid to designing algorithms such that they can be mapped onto parallel hardware. The main part of the Viterbi algorithm (VA) is a nonlinear feedback loop, the ACS recursion (add-compare-select recursion), which presents a bottleneck for high-spe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Parallel Distrib. Comput.
دوره 1 شماره
صفحات -
تاریخ انتشار 1984